NHANES 2011-2014 Weartime Vignette

NHANES 2011-2014 Weartime Vignette

Background

When analyzing accelerometry data collected in free living settings, weartime estimation is a crucial first step. If an individual isn’t actually wearing her device, we don’t want to estimate her physical activity as zero! In analyses of free-living accelerometry data, researchers typically define a weartime criteria and exclude individuals who do not meet that criteria. However, there is little consensus on weartime-based exclusion criteria, and the choice of criteria will impact the estimation of physical activity.1 In addition to widespread differences in exclusion criteria, there are many different methods to estimate wear from accelerometry data.

Weartime in NHANES 2011-2014

In the NHANES 2011-2014 data, weartime estimation was performed on the raw data using a machine learning algorithm.2

More details of the algorithm are available in the NHANES documentation. In the PAXPREDM column of the PAXMIN files, each minute of the day for each individual is assigned a wear prediction:

  • 1: wake wear
  • 2: sleep wear
  • 3: nonwear
  • 4: unknown

The purpose of this document is to explore the distribution of these wear predictions and dive deeper into the data for individuals with high amounts of wake, sleep, nonwear, and unknown predictions.

Code
library(tidyverse)
library(paletteer)
library(ggridges)
library(ggplot2)
library(lubridate)
library(scales)
library(htmlwidgets)
library(htmltools)

# demographics data 
demo = readRDS(here::here("lily", "data", "covariates_mortality_G_H_tidy.rds"))

# just get a few variables we need 
demo_small = 
  demo %>% 
  select(SEQN, data_release_cycle, gender, age_in_years_at_screening)

# day summaries for all individuals 
all_days = readr::read_csv(here::here("lily", "data", "all_days.csv.gz")) %>% 
  select(SEQN, PAXDAYM, MIMS_weartotal, MIMS_waketotal, starts_with("WTPRED"))

# join 
all_days = 
  all_days %>% 
  mutate(across(starts_with("WTPRED"), ~ifelse(is.na(.x), 0, .x))) %>% 
  left_join(demo_small, by = "SEQN")

# people with > 22 hr wake 
wake_ids = c(83233, 78159, 69691, 81998, 75556, 82037, 75220, 68655, 71457, 63922)
wake_wear_sample = 
  all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>% 
  filter(WTPRED_1 >= 22 * 60) %>% 
  filter(SEQN %in% wake_ids) %>% 
  mutate(type = "wake") %>% 
  mutate(id_day = paste0(SEQN, "_", PAXDAYM))

# people with > 22 hr sleep 
sleep_ids = c(80983, 62695, 70542, 64023, 78962, 78526, 79602, 68970,81530, 76711)
sleep_wear_sample = 
  all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>% 
  filter(WTPRED_2 >= 22 * 60) %>% 
  filter(SEQN %in% sleep_ids) %>% 
  mutate(type = "sleep") %>% 
  mutate(id_day = paste0(SEQN, "_", PAXDAYM))
  # sample_n(size = 10) 
  # pull(SEQN)

unknown_sample = 
  all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>% 
  arrange(desc(WTPRED_4)) %>% 
  slice(1:8) %>% 
  mutate(type = "unknown") %>% 
  mutate(id_day = paste0(SEQN, "_", PAXDAYM))

all_samples = 
  bind_rows(sleep_wear_sample, wake_wear_sample, unknown_sample)

files = list.files(here::here("lily", "data", "min_files_sleep_wake"),
                   full.names = TRUE)

all_min_files = 
  map_dfr(.x = files,
          .f = readr::read_csv)

all_min_files = all_min_files %>% 
  select(SEQN, PAXDAYM, PAXPREDM, PAXMTSM, time)

all_min_files_joined = 
  all_min_files %>% 
  mutate(id_day = paste0(SEQN, "_", PAXDAYM)) %>% 
  filter(id_day %in% c(all_samples$id_day)) %>% 
  left_join(all_samples %>% select(-SEQN, -PAXDAYM), by = "id_day")

Distributions of wake, sleep, nonwear, unkown

First, we can make histograms of the number of minutes predicted for each wear category, across all individuals and days.

Code
labs = c("Wake wear", "Sleep wear", "Nonwear", "Unknown")
names(labs) = paste("WTPRED_", 1:4, sep = "")
all_days %>% 
  pivot_longer(cols = starts_with("WT")) %>% 
  ggplot(aes(x = value, fill = name))+
  geom_histogram(col = "black", binwidth = 60) + 
  facet_wrap(.~name, scales = "free_y", labeller = labeller(name = labs)) +
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Number of Days", title = "Distribution of Wear Predictions")

We see that for wake and sleep wear, the distributions are bimodal; there are peaks at zero and around 15 hours for wake and 8 hours for sleep.

However, we know that some days do not have a full 24 hours of observed data, as typically, the first and last day of data collection involve less than 24 hours of data by design. Thus, we can remove days for which WTPRED_1 + WTPRED_2 + WTPRED_3 + WTPRED_4 != 1440; i.e. days for which the sum of all the wear predictions is not equal to the total number of minutes in a day.

Code
all_days %>% 
  mutate(sum = WTPRED_1 + WTPRED_2 + WTPRED_3 + WTPRED_4) %>% 
  filter(sum == 1440) %>% 
  pivot_longer(cols = starts_with("WT")) %>% 
  ggplot(aes(x = value, fill = name))+
  geom_histogram(col = "black", binwidth = 60) + 
  facet_wrap(.~name, scales = "free_y", labeller = labeller(name = labs)) +
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Number of Days", title = "Distribution of Wear Predictions, Complete Days Only")

We still have some peaks at zero for wake and sleep wear, which is likely due to high amounts of nonwear. Two criteria that are often used are 95% estimated wear time3 or >=10 hours of (wake) wear.4 First, let’s filter out who do not have at least 1440*0.95 = 1368 minutes of sleep, wake, or unknown predictions.

Code
n_subs = all_days %>% 
    filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>% 
     select(SEQN) %>% distinct() %>% nrow()
all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>% 
  pivot_longer(cols = starts_with("WT")) %>% 
  ggplot(aes(x = value, fill = name))+
  geom_histogram(col = "black", binwidth = 60) + 
  facet_wrap(.~name, scales = "free_y", labeller = labeller(name = labs)) +
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Number of Days", title = "Wear Predictions, Days with >=1368 Minutes of Wake, Sleep, or Unknown", subtitle = paste0("Individuals included = ", n_subs))

We can also look at the distributions for individuals with at least 10 hours of wake wear and 1440 total minutes:

Code
n_subs = all_days %>%
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 + WTPRED_3 == 1440) %>%
  filter(WTPRED_1 >= 10 * 60) %>% 
  select(SEQN) %>% distinct() %>% nrow()

all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 + WTPRED_3 == 1440) %>%
  filter(WTPRED_1 >= 10 * 60) %>% 
  pivot_longer(cols = starts_with("WT")) %>% 
  ggplot(aes(x = value, fill = name))+
  geom_histogram(col = "black", binwidth = 60) + 
  facet_wrap(.~name, scales = "free_y", labeller = labeller(name = labs)) +
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Number of Days", title  = "Wear Predictions, Complete Days with >= 10 Hours of Wake Wear", subtitle = paste0("Individuals included = ", n_subs))

Finally, let’s combine the two and look at individuals with at least 1368 minutes of wake, sleep, or unknown and at least 7 hours of wake wear.

Code
n_subs = all_days %>%
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>%
  filter(WTPRED_1 >= 7 * 60) %>% 
  select(SEQN) %>% distinct() %>% nrow()

all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>%
  filter(WTPRED_1 >= 7 * 60) %>% 
  pivot_longer(cols = starts_with("WT")) %>% 
  ggplot(aes(x = value, fill = name))+
  geom_histogram(col = "black", binwidth = 60) + 
  facet_wrap(.~name, scales = "free_y", labeller = labeller(name = labs)) +
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Number of Days", title  = ">= 7  Hours of Wake Wear, >= 1368 minutes of Wake, Sleep, Unknown", subtitle = paste0("Individuals included = ", n_subs))

Wake, sleep, unknown

For now, we’ll stick with the 1368 criteria. Let’s look at wake wear vs. sleep wear, colored by total MIMS5 for the day. We would expect most individuals to have between 6 and 10 hours of sleep and 14-18 hours of wake, but there are many days with wake and sleep hours outside of these boundaries. As expected, we see in general that higher wake is associated with higher total MIMS, and vice versa.

Code
all_1368 = 
  all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368)

all_1368 %>% 
  ggplot(aes(x = WTPRED_1, y = WTPRED_2, col =MIMS_weartotal))+
  geom_point(alpha = .2, size = .5) + 
  theme_bw() +
  labs(x = "Wake Wear (hours)", y = "Sleep Wear (hours)", title = "Wake Wear vs. Sleep Wear")+
  coord_equal()+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  scale_y_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  scale_color_paletteer_c("ggthemes::Sunset-Sunrise Diverging", direction = 1,
                          name = "Total MIMS")+
  theme(legend.position = "bottom")+
  guides(color = guide_colorbar(barheight = unit(1, "cm"), barwidth = unit(5.5, "cm")))

Code
  # geom_vline(xintercept = c(14*60,18*60))+
  # geom_hline(yintercept = c(6*60,10*60))

>22 Hours of Sleep

Let’s look at people on the extreme ends of wake and sleep. We focus on the days with >22 hours of predicted sleep wear, and examine the distribution of days per individual with more than 22 hours of sleep. We see the of these days come from just one individual, but some individuals have 2 or more days with more than hours of sleep.

Code
all_1368 %>% 
  filter(WTPRED_2 >= 22 * 60) %>% 
  group_by(SEQN) %>% 
  count() %>% 
  ungroup() %>% 
  count(n) %>% 
  ggplot(aes(x = n, y = nn))+
  geom_bar(stat = "identity", col = "black", fill = "#FFAE34FF")+
  theme_bw() + 
  labs(x = "Number of Days per Person with > 22 Hours Sleep Wear",
       y = "Count")+
  scale_x_continuous(breaks = seq(1, 7, 1))+
  scale_y_continuous(breaks=seq(0,150,20))

We take a random sample of 10 individuals who had more than 22 hours of sleep wear on at least one day, and plot their MIMS at the minute level across the whole day, colored by the wear prediction. We also plot a horizontal line at 10.558 MIMS, which is an approximate boundary between sedentary and active in older adults.6

Code
ids = unique(all_min_files_joined %>%
                 filter(type == "sleep") %>%
                 pull(SEQN))
  for(subject in 1:length(ids)){
    subj = ids[subject]
    age = all_min_files_joined %>%
      filter(SEQN == subj) %>%
      slice(1) %>%
      pull(age_in_years_at_screening)
    p =
      all_min_files_joined %>%
      filter(SEQN == subj) %>%
      mutate(pred = factor(
        case_when(
          PAXPREDM == 1 ~ "Wake",
          PAXPREDM == 2 ~ "Sleep",
          PAXPREDM == 3 ~ "Nonwear",
          PAXPREDM == 4 ~ "Unknown"
        ),
        levels = c("Wake", "Sleep", "Nonwear", "Unknown")),
        minutes = paste0("Day ", PAXDAYM, " Sleep min = ", WTPRED_2)) %>%
      ggplot(aes(
        x = time,
        y = PAXMTSM,
        col = pred,
        group = 1
      )) +
      geom_line() +
      facet_wrap(. ~ minutes, scales = "free_x", nrow = 3) +
      labs(x = "Time", y = "MIMS", title = paste0("SEQN ", subj, " ", age, " y.o.")) +
      theme_bw() +
      scale_x_datetime(labels = date_format("%H:%M")) +
      theme(legend.position = "bottom") +
      scale_color_manual(
        name = "Prediction",
        values = c(
          "Wake" = "#6388B4FF",
          "Sleep" = "#FFAE34FF",
          "Nonwear" = "#EF6F6AFF",
          "Unknown" = "#8CC2CAFF"
        )
      )+
      geom_hline(yintercept = 10.558)
   print(p)
  }

>22 Hours of Wake

We make analogous plots for individuals with more than 22 hours of wake. With the exception of individual 78159, who is 4 years old, it appears most of these individuals were in fact active all day.

Code
all_1368 %>% 
  filter(WTPRED_1 >= 22 * 60) %>% 
  group_by(SEQN) %>% 
  count() %>% 
  ungroup() %>% 
  count(n) %>% 
  ggplot(aes(x = n, y = nn))+
  geom_bar(stat = "identity", col = "black", fill = "#6388B4FF")+
  theme_bw() + 
  labs(x = "Number of Days per Person with > 22 Hours Wake Wear",
       y = "Count")+
  scale_x_continuous(breaks = seq(1, 7, 1))+
  scale_y_continuous(breaks=seq(0,150,20))

Code
ids = unique(all_min_files_joined %>% 
          filter(type == "wake") %>% 
          pull(SEQN))
for(subject in 1:length(ids)){
  subj = ids[subject]
  age = all_min_files_joined %>% 
    filter(SEQN == subj) %>% 
    slice(1) %>% 
    pull(age_in_years_at_screening)
  p = 
    all_min_files_joined %>%
    filter(SEQN == subj) %>%
    mutate(pred = factor(
      case_when(
        PAXPREDM == 1 ~ "Wake",
        PAXPREDM == 2 ~ "Sleep",
        PAXPREDM == 3 ~ "Nonwear",
        PAXPREDM == 4 ~ "Unknown"
      ),
      levels = c("Wake", "Sleep", "Nonwear", "Unknown")),
      minutes = paste0("Day ", PAXDAYM, " Wake min = ", WTPRED_1)) %>% 
    ggplot(aes(
      x = time,
      y = PAXMTSM,
      col = pred,
      group = 1
    )) +
    geom_line() +
    facet_wrap(. ~ minutes, scales = "free_x", nrow = 3) +
    labs(x = "Time", y = "MIMS", title = paste0("SEQN ", subj, " ", age, " y.o.")) +
    theme_bw() +
    scale_x_datetime(labels = date_format("%H:%M")) +
    theme(legend.position = "bottom") +
    scale_color_manual(
      name = "Prediction",
      values = c(
        "Wake" = "#6388B4FF",
        "Sleep" = "#FFAE34FF",
        "Nonwear" = "#EF6F6AFF",
        "Unknown" = "#8CC2CAFF"
      )
    )+
    geom_hline(yintercept = 10.558)
  print(p)
}

Large amounts of unknown

We look at the distribution of time characterized as ‘unknown’, and also look at total unknown minutes vs. total MIMS, wake wear, and sleep wear. For the scatterplots, we look at only individuals with <=3 hours of unknown wear for easier visualization.

Code
all_1368 %>% 
  ggplot(aes(x = WTPRED_4))+
  geom_histogram(col = "black", binwidth = 30, fill = "#8CC2CAFF") + 
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Superfishel_Stone")+
  scale_x_continuous(breaks=seq(0, 1440, 30),
                     labels = seq(0, 24, .5))+
  theme(legend.position = "none")+
  labs(x = "Hours", y = "Count", title = "Distribution of Unkown Wear")

Code
all_1368 %>% 
  filter(MIMS_weartotal < 60000) %>% 
  ggplot(aes(y = MIMS_weartotal, x = WTPRED_4))+
  geom_point(alpha = .1, size = .5) +
  theme_bw() + 
  geom_smooth()+
  scale_x_continuous(breaks=seq(0, 1440, 30),
                     labels = seq(0, 24, .5),
                     limits= c(0, 3*60))+
  labs(x = "Hours Unkown Wear", y = "Total MIMS", "Total MIMS vs. Unknown Wear")

Code
all_1368 %>% 
  ggplot(aes(y = WTPRED_1, x = WTPRED_4))+
  geom_point(alpha = .1, size = .5) +
  theme_bw() + 
  geom_smooth()+
  scale_x_continuous(breaks=seq(0, 1440, 30),
                     labels = seq(0, 24, .5),
                     limits= c(0, 3*60))+
  labs(x = "Hours Unkown Wear", y = "Hours Wake Wear", "Wake Wear vs. Unknown Wear")+
  scale_y_continuous(breaks=seq(0, 1440, 60),
                     labels = seq(0, 24, 1))

Code
all_1368 %>% 
  ggplot(aes(y = WTPRED_2, x = WTPRED_4))+
  geom_point(alpha = .1, size = .5) +
  theme_bw() + 
  geom_smooth()+
  scale_x_continuous(breaks=seq(0, 1440, 30),
                     labels = seq(0, 24, .5),
                     limits= c(0, 3*60))+
  labs(x = "Hours Unkown Wear", y = "Hours Sleep Wear", title = "Sleep Wear vs. Unkown Wear")+
  scale_y_continuous(breaks=seq(0, 1440, 60),
                     labels = seq(0, 24, 1))

It’s unclear how the ‘unknown’ wear should be treated. One future direction would be to look at the distribution of the length of the bouts of time characterized as unknown. If most of the bouts are short in length, perhaps unknown often happens in transition between sleep/wake, or somewhat randomly throughout the day. For now, we can look at a few individuals with the highest amount of unknown wear. At least from this small sample, it seems unknown is usually sandwiched between wake and/or movement.

Code
ids = unique(all_min_files_joined %>% 
          filter(type == "unknown") %>% 
          pull(SEQN))
for(subject in 1:length(ids)){
  subj = ids[subject]
  age = all_min_files_joined %>% 
    filter(SEQN == subj) %>% 
    slice(1) %>% 
    pull(age_in_years_at_screening)
  p = 
    all_min_files_joined %>%
    filter(SEQN == subj) %>%
    mutate(pred = factor(
      case_when(
        PAXPREDM == 1 ~ "Wake",
        PAXPREDM == 2 ~ "Sleep",
        PAXPREDM == 3 ~ "Nonwear",
        PAXPREDM == 4 ~ "Unknown"
      ),
      levels = c("Wake", "Sleep", "Nonwear", "Unknown")),
      minutes = paste0("Day ", PAXDAYM, " Unknown min = ", WTPRED_4)) %>% 
    ggplot(aes(
      x = time,
      y = PAXMTSM,
      col = pred,
      group = 1
    )) +
    geom_line() +
    facet_wrap(. ~ minutes, scales = "free_x", nrow = 3) +
    labs(x = "Time", y = "MIMS", title = paste0("SEQN ", subj, " ", age, " y.o.")) +
    theme_bw() +
    scale_x_datetime(labels = date_format("%H:%M")) +
    theme(legend.position = "bottom") +
    scale_color_manual(
      name = "Prediction",
      values = c(
        "Wake" = "#6388B4FF",
        "Sleep" = "#FFAE34FF",
        "Nonwear" = "#EF6F6AFF",
        "Unknown" = "#8CC2CAFF"
      )
    )+
    geom_hline(yintercept = 10.558)
  print(p)
}

Patterns of wake, sleep wear by age

Finally, we can look at patterns of sleep and wake wear by age.

Code
median = 
  all_1368 %>% 
  filter(age_in_years_at_screening >= 15) %>% 
  summarize(median = median(WTPRED_1)) %>% 
  pull(median)

plt = 
  all_1368 %>% 
  filter(age_in_years_at_screening >= 15) %>% 
  mutate(age_group = cut(age_in_years_at_screening, breaks=seq(0,85, 5), include.lowest = TRUE)) %>% 
  ggplot(aes(x = WTPRED_1, y = age_group, fill = age_group))+
  geom_density_ridges()+
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Hue_Circle")+
  theme(legend.position = "none")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  labs(x = "Wake Wear (hr)", y = "Age Group")+
  geom_vline(xintercept = median, col = "darkgrey")

plt + 
  annotate("text", x = 22*60, y = 15, label = paste0("Overall median = ", round(median / 60, 1)))

Code
  # geom_text(aes(x = 22 * 60, y = 15, label = median))


median = 
  all_1368 %>% 
  filter(age_in_years_at_screening >= 15) %>% 
  summarize(median = median(WTPRED_2)) %>% 
  pull(median)

all_1368 %>% 
  filter(age_in_years_at_screening >= 15) %>% 
  mutate(age_group = cut(age_in_years_at_screening, breaks=seq(0,85, 5), include.lowest = TRUE)) %>% 
  ggplot(aes(x = WTPRED_2, y = age_group, fill = age_group))+
  geom_density_ridges()+
  theme_bw() + 
  scale_fill_paletteer_d("ggthemes::Hue_Circle")+
  theme(legend.position = "none")+
  scale_x_continuous(breaks=seq(0, 1440, 120),
                     labels = seq(0, 24, 2))+
  labs(x = "Sleep Wear (hr)", y = "Age Group")+
  geom_vline(xintercept = median, col = "darkgrey")+
  annotate("text", x = 22*60, y = 15, label = paste0("Overall median = ", round(median / 60, 1)))

Final criteria

Based on the above visualizations, we think the following criteria is reasonable for a day to be considered valid:

  • At least 1368 minutes (95% of a full day) of wake, sleep, or unknown
  • At least 7 hours (420 minutes) of wake wear
  • 0 total MIMS

We can look at the distribution of valid days per individual, based on this criteria.

Code
all_days_valid = 
  all_days %>% 
  filter(WTPRED_1 + WTPRED_2 + WTPRED_4 >= 1368) %>%
  filter(WTPRED_1 >= 7 * 60) %>% 
  filter(MIMS_weartotal > 0)

all_days_valid %>% 
  group_by(SEQN) %>% 
  count() %>% 
  ggplot(aes(x= n))+
  geom_bar(stat = "count", col = "black", fill = "#55AD89FF")+
  theme_bw() + 
  labs(x = "Number Valid Days", y = "Count", title = "Distribution of Valid Days per Individual")+
  scale_x_continuous(breaks=seq(1, 7, 1))

We can also plot the number of valid days per individual and mean daily MIMS. There odesn’t appear to be a relationship between valid days and mean MIMS, which is reassuring.

Code
all_days_valid %>% 
  group_by(SEQN) %>% 
  mutate(valid_days = n()) %>% 
  group_by(SEQN, gender, age_in_years_at_screening, valid_days) %>% 
  summarize(across(c(MIMS_weartotal, WTPRED_1, WTPRED_2), 
                   ~mean(.x))) %>% 
  ggplot(aes(x = valid_days, y = MIMS_weartotal, group = valid_days))+
  geom_boxplot()+
  theme_bw()+
  scale_x_continuous(breaks=seq(1, 7, 1))+
  labs(x = "Valid Days", y = "Mean Daily MIMS")

Code
all_days_valid %>% 
  group_by(SEQN) %>% 
  mutate(valid_days = n()) %>% 
  group_by(SEQN, gender, age_in_years_at_screening, valid_days) %>% 
  summarize(across(c(MIMS_weartotal, WTPRED_1, WTPRED_2), 
                   ~mean(.x))) %>% 
  ggplot(aes(x = valid_days, y = MIMS_weartotal))+
  geom_jitter(width = .3, alpha = .1)+
  theme_bw()+
  scale_x_continuous(breaks=seq(1, 7, 1))+
  labs(x = "Valid Days", y = "Mean MIMS")

Footnotes

  1. https://pubmed.ncbi.nlm.nih.gov/22936409/, https://pubmed.ncbi.nlm.nih.gov/23036822/↩︎

  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9615811/↩︎

  3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8277083/↩︎

  4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7012355/↩︎

  5. https://pubmed.ncbi.nlm.nih.gov/34308270/↩︎

  6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9356340/↩︎